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INTERACTIVE DATA ANALYSIS Natural language interfaces to data base management 

EMPLOYING A KNOWLEDGE BASE system 101. 

A modem example of such techniques is BusincssObjccts, in 

This is a continuation of application Scr. No. 07/972,785, which 811 S Q L ex P erl relates forms employing terms with 

filed on Nov. 6, 1992 uow Pat. No. 5,659,724, issued Aug. 5 which the user is familiar to queries in the SQL query 

18 1997 language. By filling out the forms, the user can generate 

SQL queries without knowing the SQL query language. 

BACKGROUND OF THE INVENTION While the above techniques are worthwhile, none of them is 

1 Field of the Invention able 10 deal with situations in which the information of 

The invention relates to data analysis generally and more 10 mtcrcst fa contained in more than one kind of data base 

specifically to data analysis performed using knowledge management system 101 

bas vst ms Another problem with data base management system 101 

C ^ C . '. f . n • is the relative inflexibility of its organization. Changes to 

2. Description of the Prior Art schcma m may ^ made only Dy ^^,5 intimately 

In the computer age, information is stored primarily in familiar with schema 113 and its relationship to data base 

data base management systems. FIG. 1 is a schematic block 15 117. Indeed, in many systems 101, schema 113 is produced 

diagram of a data base management system (DBMS) 101. by compilation, and consequently, a change to schema 113 

System 101 is implemented using storage devices such as requires recompiling the entire data base management sys- 

disk drives to store the information and processors coupled tem 101. The inflexibility of the organization causes prob- 

to the disk drives to access the data. In system 101, a query lems both for data base management system 101's design 

103, which describes the information to be located, is 20 and for its later use. Because of the inflexibility of the 

presented to DBMS 101, which processes the query in query organization, it is difficult and expensive to design schema 

manager 107, locates the information in data base 117, and 113 for a data base management system 101. In particular, it 

returns it as data 105. Query 103 describes the information is difficult to use the technique of producing a prototype and 

to be located by using names. For example, a query in the experimenting with it to determine the best form for the final 

SQL query language has the following general form: 25 system. Because of the inflexibility of the organization, it is 

also difficult to access the data in data base 117 in ways 

unenvisioned in the original design of schema 113. This 

select <fieid names> problem has become more important as the information in 

STretit^^t mws must satisfy* lar B e daU base management systems 101 has been used not 

30 only for its originally-intended purposes, but also as a 
resource for various kinds of research. Since the schema of 
Of course, the information in data base 117 is not located by ^ data basc management system was set up for the original 
names, but rather by means of addresses in whatever storage purpose, it is difficult to fashion queries which look at the 
device data base 117 is implemented on. The relationship information in the manner required for the research, 
between the names used in the queries 103 and the addresses 35 The aD ove and other problems of data base management 
used in data base 117 is established by schema 113, which systems 101 may be solved by employing knowledge base 
defines the names used in the queries in terms of the management systems in conjunction with data base man- 
locations in data base 117 which contain the data referred to agement systems. In the present context, the chief distinction 
by the names. between a knowledge base management system and a data 

Operation of data base management system 101 is as 40 base management system is this: in a data base management 

follows: Query 103 is received by query manager 107, systcm> thc designer of schema 113 uses his or her conccp- 

which parses it. query manager 107 presents the names 109 tua j knowledge of the data in data base 117 to design schema 

in query 103 to schema 113, which returns descriptors 111 ll3; however, schema 113 and thc query language do not 

describing the data represented by the names in data base re fl ect me conceptual knowledge. For example, in systems 

117. Query manager 107 then uses the descriptors and the 45 using SQL? queries specify data by specifying tables and 

query 103 to produce a stream of operations 112 which cause rows and columns in the tables. In a knowledge base 

data base 117 to return the data 105 specified by query 103. management system on the other hand, both thc equivalent 

Query manager 107 then returns thc data 105 to the user who l0 tne sc bema and the language used to describe data reflect 

produced the query. me conceptual knowledge. U.S. patent application 07/781, 

Data base management systems 101 are effective for s 0 454, Borgida et al., Information Access Apparatus and 
storing and retrieving data; they do. however have a number Methods, filed Oct. 23, 1991, and assigned to the assignees 
of problems. One of the problems is complexity; query 0 f lne present patent appication, describes generally how a 
languages such as SQL are not simple. Further, schema 113 knowledge base management system may be used in con- 
in a large data base management system 101 is also complex. junction with a data base system; the present patent appli- 
Effective formulation of queries 103 requires detailed under- 55 ca ii on presents more detail concerning the uses and advan- 
standing not only of the query language used in system 101 tagcs of integrating knowledge base management systems 
but also of the meanings of thc names used in schema 113. w i tn data base management systems. 

For this reason, formulation of queries for system 101 is 

often left to specialists. Tne overhead involved here is SUMMARY OF THE INVENTION 

considerable in any case and grows if different data base 60 The foregoing problems of prior-art data base manage- 

managemcnt systems 101 with different query languages are ment systems are solved by a virtual data basc management 

involved. Attempts to overcome the complexity of query system. The virtual knowledge base management system 

writing have included techniques such as thc following: includes 

Forms which the user fills out interactively. Thc queries one or more data basc management systems for receiving 

are generated from the forms. 65 A 151 queries and returning data in response thereto; 

Redefinition of the names used in schema 113 in terms of a knowledge base management system for organizing the 

concepts familiar to the user of the system. data in a knowledge base according to a set of concepts 
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and operating on the data in response to expressions means responsive to the first collection specification, the 

stated in a description language which employs the query language expression, and the means for making 

concepts; a specification for making the new query, the new query 

means for receiving the expressions, translating the specifying a second collection made up of the individu- 

expressions into the first queries, receiving the data, 5 ais with which the information in the portion is asso- 

and returning the data together with the expressions to dated. 

the knowledge base management system for incorpo- 0lner objects and advantages of the apparatus and methods 

ration into the knowledge base; and disclosed herein will be apparent to those of ordinary skill in 

means for receiving second queries specifying certain of * e « u P OD Pf 5 *! of followin S Drawin S ™ d Detailed 

the data and responding thereto by translating the 10 Description, wherein: 

second queries into expressions specifying the certain BRIEF DESCRIPTION OF THE DRAWING 
data, providing the expressions to the knowledge base 

management system, receiving the certain data from FIG. 1 is a schematic block diagram of a prior art data 

the knowledge base management system, and provid- base management system; 

ing the certain data. FIG. 2 is a schematic block diagram of an information 

The fact that the virtual data base management system retrieval system which uses a knowledge base management 

includes a knowledge base management system gives the sys tem in conjunction with data base management systems; 

virtual data base management system the ability to perform 3 fc a ^ m q{ ^ ^ Qwkd base 

novel operations including converting a query into a concept emcnl ^ of RG . 2 ; 

used in the knowledge base management system and track- & 1 ' . „ . 

ing movement of an individual in the knowledge base . FIG - 4 "^P 1 definitions and individual defini- 

management system from one category to another. As tions; 

regards query conversion, that aspect of the invention may FIG. 5 is a detail of virtual query manager 227; 

be summarized as follows: Apparatus for organizing a body FIG. 6 is a diagram of an example domain model; 

of information including 25 F IG. 7 shows a table template and a table; 

A knowledge base wherein the body of information is pIG. 8 shows segmentation using a graph; 

represented by individuals and concepts which orga- „ sfaows mmi 

nize the individuals; 

, . , , . , , FIG. 10 shows a form; 

means coupled to the knowledge base for respondmg to a 30 . 

query specifying a collection of the individuals by FIG. 11 shows a set of windows used in the system of FIG. 

making a collection specification which specifies the *» 

same collection of individuals and has a form compat- FIG. 12 is a diagram of user interaction with the system 

ible with the concepts; and of FIG. 2; 

means coupled to the knowledge base for receiving the 35 FIG. 13 is a diagram showing how a query is derived from 

collection specification and integrating the collection a graph; 

specification into the concepts. FIG. 14 shows the windows used to define concepts from 

As regards tracking movement of an individual from one collections; and 

category to another, that aspect of the invention may be piG. 15 shows the windows used with monitors, 

summarized as follows: Apparatus for detecting a change in 40 r ^ dcc fa ^ D haye ^ ^ 

a body of information including ^ lcast _ si ificant ^ arc thc nu b mbcr of an \ {cm in a 

A knowledge base wherein the body of information is fi me remaining digits are tDe Dumber of the figure in 

represented by individuals and concepts which orga- whjch ^ itcm first appears ^ m itcm with thc rcfcrcncc 

nize the individuals; number 201 firsl appeare in FIG 2 

means for making an alteration with regard to one or more 45 

of the individuals; DETAILED DESCRIPTION OF A PREFERRED 

means responsive to thc alteration for making a reorga- EMBODIMENT 

nization of the individuals as required by the alteration _ . „ _ , ., , _ . . c e , 

and the conce ts- and following Detailed Description of a preferred 

. * so embodiment will begin with an overview of the preferred 

means responsive to the reorganizat.on for indicating an cm5odimenl ^ its aUon and ^ then discuss areas of 

effect of the reorganization with regard to one or more interest ^ more ^ lhe ^ mtcr . 

or the individuals. face fof me preferred embodiment will be described in 

Finally, the virtual data base management system employs detail 

a technique for generating a query from a graph which may f$ 0verview of , preferred Embodiment: FIG. 2 

be employed in any kind of data base management system. nc 2 ^ a bk)ck di of an information retrieval 

In that aspect, the invention may be summarized like this: 201 which , a k^^ge base management 

Apparatus for making a new query including systcm m ^j^^ ^ onc or morc data basc managc . 

display means for displaying a graph based on informa- menl systems 101. In essence, the knowledge base manage- 

tion associated with individuals belonging to a first 60 mcnt systcm (KBMS) 217 is used to create a virtual data 

collection; base management system (VDBMS) 215. The word virtual 

means corresponding to the display means for storing a is used here in a sense similar to that in which it is used in 

first collection specification specifying the firsl collec- the concept virtual memory system. A virtual memory 

tion and a query language expression for obtaining the system permits a programmer to address data by means of 

information upon which the graph is based; 65 logical addresses which the system automatically translates 

means coupled to the display means for making a sped- into the physical addresses of the actual data. The program- 

fication of a portion of the graph; and mer thus need have no notion of how the computer system 
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on which the program is run actually stores data. A virtual retrieval system 201 by means of keyboard 207 and pointing 
data base management system similarly obtains its data from device 209. Inputs from the keyboard and pointing device, 
one or more data base management systems (database indicated by arrow 233, go to graphical user interface 
management systems 101(0 ... n) in FIG. 2), but both the manager 229, which generates virtual data base commands 
schema and the query language used in the virtual data base 5 (VDBC) 211 based on the inputs. The virtual data base 
manageement system are independent of the schemas and commands 211 arc provided to virtual query manager 227. 
query languages used in the data base management systems. i nc hided in the virtual data base commands 211 arc con- 
Further, because the schema in the virtual data base man- ccptuaJ qucric& A conccptual query ^ y^ton in a query 
agement system is independent of the schemas in the data language which is specifically adapted to knowledge base 
base management systems containing the data, the schema 10 managcment system 217 and which expresses the query in 
in the virtual data base management system may be specifi- lerms of ^ concept employed in virtual schema 219. The 
cally tailored to the domain which the virtual data base conccp tual query is thus independent of any of the query 
management system is being used to investigate. languages or schemas used in the data base management 

The use of a knowledge base management system to sysIems ioi and further employs concepts which are 

create the virtual data base management system provides lS <jjre Ct i y relevant to the research being undertaken, 

additional advantages: Virtual query manager 227 converts the queries into 

the schema is made using concepts pertinent to the operations 223 which can be executed by knowledge base 

domain being investigated, and the concepts may be management system 217; in response to the operations, 

used directly in the queries; knowledge base management system 225 returns a collec- 

the knowledge base system can incorporate new concepts 20 tion 225 of information from virtual data base 221. A 

into the schema, which thus becomes dynamically collection as used herein is like a set, except that the 

extendable; and collection may contain elements which are identical. For 

changes in relationships between the concepts used in the example, {a»b,c} is a set, while {a,a,b,c} is a collection, 

schema and the data contained therein can be detected. Information 213 based on collection 225 is then returned to 

As will be explained in more detail below, these advantages 25 graphical user interface manager 229, which uses it in 

make information retrieval system 201 substantially easier windows in display 205, as indicated by arrow 231. For 

to use and substantially more flexible than prior-art infor- example, graphical user interface manager 229 might use 

mation retrieval systems. data from collection 225 to make a graph which is displayed 

Continuing with the description of information retrieval in a window in display 205. 
system 201, in a presently-preferred embodiment, the first 30 Details of virtual data base management system 215: FIG. 3 
step in implementing information retrieval system 201 is to FIG. 3 is a detailed block diagram of virtual data base 
design a virtual schema 219 using concepts relevant to the management system 215 in a preferred embodiment. Bcgin- 
research to be done. Once this is done, the techniques ning with knowledge base management system 217, knowl- 
described in the Borgida, ct al. patent application supra are edge base management system 217 is implemented using the 
used to load data 105 from one or more data base manage- 35 CLASSIC description language-based knowledge base man- 
ment systems 101 into virtual data base 221 of knowledge agement system. Description language-based knowledge 
base management system 217. Loading is done by providing base management systems take descriptions of concepts or 
descriptions of the concepts in the schema in a description of individual objects which are written in a description 
language (DL 223) used in knowledge base management language and classify the concepts or the individual objects, 
system 217 to translator 226, which translates the descrip- *o that is, they find their relationship to all of the concepts or 
tions into queries 103 as required for the relevant data base individual objects which are already in the data base. Clas- 
management systems 101. When the data is returned to sification relies on the ability of the knowledge base man- 
translator 226, translator 226 provides the data, together agement system to find a generalization (or subsumption) 
with a description of it in description language 223 (arrow relationship between any pair of terms expressed in the 
224), to virtual data base management system 215. Knowl- 45 description language. Classification finds all previously- 
edge base management system 217 then adds the data to specified descriptions that are more general (i.e., that 
virtual data base 221 as required by the descriptions. The subsume) the new one, and all previously-specified descrip- 
presently-preferred embodiment of information retrieval lions that are more specifie (i.e., that are subsumed by) the 
system 201 is used in an environment in which only monthly new one. They can find which of the more general ones are 
updates of the data in virtual data base 221 are required; 50 most specific, and which of the the more specific ones are the 
consequently, loading is done using a "batch" technique. In most general, and place the new one in between those. This 
other environments in which updates must be made more yields a generalization ordering among the descriptions — a 
frequently, loading could be done by having translator 226 partial ordering based on the subsumption relationship. The 
retain the description language 223 descriptions, producing partial ordering may be thought of as a hierarchy, although 
queries 103 from mem at the required intervals, and pro- 55 most description languages permit any description to have 
viding the resulting data and descriptions to virtual data base multiple more general descriptions, and thus do not yield a 
management system 215. Alternatively, a user who had strictly hierarchical ordering. Description language-based 
become aware of a relevant change in a data base manage- knowledge base management systems arc described in R. J. 
ment system 101 could request that the changed data be Brachman and J. G. Schmolze, "An Overview of the 
loaded into virtual data base 221. 60 KL-Onc Knowledge Representation System," Cognitive 

Once virtual data base management system 215 is loaded, Science, vol. 9, No.2, Aprul-June 1985, pp. 171-216. The 

a user may employ graphical user interface 203 to query description language used in the CLASSIC system is 

virtual data base management system 215 and sees the described in R. J. Brachman, el al., "The Classic User's 

results of the queries. Graphical user interface 203 includes Manual, AT&T Bell Laboratories Technical Report, 1991. 

a display 205, upon which the information required by the 65 A CLASSIC knowledge base 320 has three main parts 

user is displayed in one or more windows. The user controls (see FIG. 3): (1) a set of concept definitions (Cones) 311; 

graphical user interface 203 and thereby information these are the named descriptions that are stored and orga- 
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nized by ihe CLASSIC KBMS. As mentioned above, they In the preferred embodiment, the fact that changes in the 

can be either primitive or compositional; (2) a set of binary relationship between concepts 311 and individuals 313 can 

relation definitions (Rels) 314; in CLASSIC these can be be detected is used to provide a conceptual version of the 

"roles", which can have more than one value (e.g., child), or triggers used in standard data base systems. A trigger is 

"attributes", which can have only a single filler (e.g., age, s typically defined in terms of allowed values in a field; when 

mother); and (3) a set of individual object descriptions fieU * set to a value which is outside the defined limits, 

(INDS) 313, which characterize individual objects in the code associated with the trigger is executed. For example a 

world in terms of the concept definitions 311 and which are dala basc ° f chc ^ Q 8 acc *T 15 ^ ha * e a 00 ^ 

related together by means of the role definitions 314. With a ~ 0U °? "2 ft S*"* 8 I ° ^Tt 

regard to the relationship between FIG. 3 and FIG. 2, io when tte chectang account balance goes belowzero. Such 

. *. 7. , ' * „ , * . .... , conceptual triggers are termed herem monitors. They appear 

individuals 313 implement virtual data base 221 and con- m R £ 3 w » nilora 3Q5 Each monitor fe ^n&Jfo 

cepts 311 and relationships 314 together implement virtual defin£S an acliQn to be Xakea jf feclassificalion of individuals 

schema 219. 31 3 ^^i^ i n a gj vcn k^d 0 f change in the relationship of 

Examples of concepts and individual objects (hereinafter mc individuals 313 to concepts 311. Monitors 305 monitors 

simply individuals as they are expressed in description is ihe reclassification performed by classifier 315, as indicated 

language 223 are given in FIG. 4. In concept definitions 401, by arrow 397, an d if the reclassification satisfies a monitor, 

the PERSON primitive concept definition 403 says that a the action defined in the monitor is taken. For example, if the 

person is, among other things (the qualification is the concepts 311 includes a concept WOMAN which is like 

meaning of the "PRIMITIVE" construct), something with at MOTHER but not restricted by (AT-LEAST 1 children) then 

most two parents, exactly 1 gender and exactly 1 age. The 20 a monitor might detect movement of individuals from the 

MOTHER compositional concept definition 405 equates the concept WOMAN to the narrower concept MOTHER and 

term MOTHER with the phrase "a person whose gender is define an action based on such movement, 

exactly 'female' and who has at least one child". In the As is apparent from the foregoing, a user at graphical user 

individual portion of the knowledge base we have assertions interface 203 can use virtual data basc commands 211 to 

that individuals 407 satisfy named concepts 401, Lc, LIZ 25 define concepts either directly or by specifying a collection 

409 satisfies the previously defined concept, MOTHER; and J° be converted into a concept (both possibilities appear in 

we also have assertions of the relaaonships between indi- 3 " concept description (CD) 321), can define a 

viduals 409 in terms of roles 314 such as age 41 (not shown, conceptual query 319, and can define a monitor 305 In the 

since they have no structure in this embodiment), such as of f ***** dcfined clas f h cr 315 ""^ do f 

wrru v ij u -i-in ■ ■ . ■ a u tne reclassification necessary to add the new concept to 

. l f *&' 65 ' S^Vf mamlaincd by 30 concepts 311; in the case of a concept defined by means of 

classifier (Class) 315, which classifies descriptions as set a processor 301 makes a new concept 317 

forth above. For example, if a new individual 409 who is a from lhe coueclion and prov ides it to classifier 315. 

mother is added to individuals 313, it is classified under the In me case of ^ inpul which defines a concep tual query 

MOTHER and PERSON concepts; similarly, if a new 319, query processor 301 converts the conceptual query 319 

concept, such as FATHER is added to concepts 311, it is 35 mto collection specification 318 Knowledge base manage- 

classified with regard to the other concepts. Here, of course, ment system 217 responds to collection specification 318 by 

it would be classified under PERSON. It should be noted at performing operations which result in the return of a col- 

this point that the notions of individual and concept lection 225 to virtual query manager 227. Virtual query 

employed herein correspond to the notions of object and manager 227 retains collection 225 in saved collections 303 

class employed in object-oriented systems. 40 and uses it to produce output 213 to graphical user interface 

The fact that virtual data base management system 215 205. Finally, a user at graphical user interface 203 may 

employs a description language-based knowledge base man- define a monitor in monitors 305. The definition includes 

agement system such as CLASSIC gives it two important both an action to be taken and the condition under which the 

advantages over a standard data base management system. action is to be taken. In the following, the techniques used 

The first important advantage is that because the virtual 45 to make collections into concepts and to define monitors will 

schema 219 Is implemented using concepts 311 and relatioos be described in more detail; in addition, a graphical tech- 

314, it can be extended dynamically. Ail that is required to nique for defining a query will be described, 
extend the virtual schema is to add a new concept to it. Details of Query Processing: FIG. 5 

Classifier 315 is then able to integrate the new concept into FIG. 5 shows in more detail how queries are processed 

the hierarchy of concepts in concepts 311. The second so and concepts are made from collections in a preferred 

important advantage is that changes in the relationship embodiment. Query processor 301 has two main compo- 

between individuals 313 and concepts 311 are detectable. nents: query interpreter (QI) 501, which interprets concep- 

For example, when virtual data base 221 is updated, know!- tual queries 309, and collection specification processor (CP) 

edge base management system 217 receives data and 507, which provides collection specificatioas 511 to knowl- 

description language descriptions (arrow 224) in classifier 55 edge base management system 217. Such collection speci- 

315, which then classifies the data as required by concepts fications 5U are provided for two purposes: so that knowl- 
311; it can be determined from the classification operation edge base management system 217 returns the collection 
whether more or fewer individuals were subsumed under a 225 corresponding to the concept and so that a collection 
given concept than previously. specification can be named and added to concepts 311 as a 

In a preferred embodiment, the fact that new concepts can 60 new concept 317. Collections 225 are represented in saved 

be added to concepts 311 is used to make queries into collections 303 by collection objects (CO) 509. A collection 

concepts; that is, when a user of information retrieval system object 509 always contains a collection specification 511 

201 defines a particularly interesting conceptual query, the which describes the collection 225 in terms which may be 

collection returned by the conceptual query can be converted interpreted by classifier 315 and may also contain collection 

to a concept 403 and added to concepts 311, as shown by 65 individuals 513, the actual individuals from individuals 313 

new concept (NCONC) arrow 317. The manner in which which make up collection 225 represented by collection 

this is done will be described in more detail below. object 509. 
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Query processing proceeds as follows: a conceptual query Details of Query Interpreter 501 

319 is defined by a user at graphical user interface 203; Query interpreter 501 translates a conceptual query into a 

query interface 501 receives the conceptual query and makes CLASSIC description language expression. The translation 

an empty collection object 515 for the collection 225 speci- will be illustrated in the following for several simple cases, 

fied by conceptual query 319. The empty collection object S The following example assumes a simple domain model 

515 contains only collection specifier 511 for the collection. *>* . whic \ w \ h * ve ^fined the concept PERSON and the 

Collection specifier 511 in a preferred embodiment consists attributes NAME and AGE. The most common conceptual 

of a description in description language 223 of one or more V'™* are <* a form * at a s f*< of a »tecton, 

concepts in concepts 311 which contain the individuals y icldm § anolher <*^™ « * result; the "torn for this 

specified in the conceptual query 319. If the collection is 10 typc 01 qucry a 

made up of fewer than all of the individuals included in the ^ 

concepts, test functions in the collection specifier further "* <var> w <co i lea i on> 

limit the concepts so that only the individuals in the collec- WHERE <boo lean -express io n> 
tion specified in conceptual query 319 are returned. In a 



preferred embodiment, Oie test functions art svmttenm LlSR is whcfc ^ WH£RE afc k ^ <var> specifies a 
The test functions, which are a part of the CLASSIC variab <collcction> , co r iection> and /boolean- 
knowledge base management system, are required because cssioD> an cxprcssion uscd to selccl individuals from 
the language used for conceptual queries 319 is designed for t0 ^ bound t0 me variable . C ouce P tually, this 
case of use in querying and ^ consequently more expressive ^ Qver ^ e , ements Qf <collection>> succes . 
than description language 223 which is designed for com- 20 * ^ <vaf> tQ each elemem and evaluati 
pulaUonal tractabmty in the classification operation. The <boo i ean ^ xpression> in lerms of lhat binding (U . the 
algorithms used to translate a conceptual query 319 into a ^^^^^ is usuaUy m lerms of <var>) . For 
collection specifier 511 will be described in more detail we migm ^ & quefy ^ &elec{s a of 

b ° w * ... L . ^ * ■ . ■ , „ • those persons named Bob: 

Empty collection object 515 is stored m saved collections 25 

303. At a point in the query processing where the individuals 

in the collection specified by collection specifier 511 are x in person 

actually required, collection processor 507 retrieves collec- where x.nnme - Bob 

tion specifier 511 from empty collection object 515 and 

provides it to classifier 315. Classifier 315 classifies collec- 30 ^ query can be expressed completely within the CLAS- 

tion specifier 511 according to the concepts specified in the S1C descriplio[1 i angua ge, so the coUection produced as the 

description language 223 portion of the collection rcsult of mis qucry b rcprcscntcd ^ an unnamed concept 

description, then determines which individuals are specified ^ ±e f 0 n ow i ng CLASSIC expression: 

by those concepts, and finally employs the test functions to 

select the desired individuals from the ones specified by the 35 

concepts. Those individuals make up the collection 225, (and person 

which is added to the empty collection object 515 to make namc Bo*)) 

collection object 509, which contains not only coUection 

specification 511, but collection individuals 513. Informa- when the elements of this collection are requested, the 

lion from collection individuals 513 may then be used to *o conccpt exprC ssion is parsed and normalized to create an 

generate displays in GUI 203, as indicated by arrow 213. unnamed temporary concept in the knowledge base. The 

Because collection specifier 511 is unnamed, it does not c i cmc nts 0 f the collection arc the extent of this unclassified 

become a permanent part of concepts 311. concept. By giving the collection a name (say, collection-1), 

If a user of information retrieval system 201 finds a wc can re f cr t0 ^is collection in subsequent queries. For 

collection 225 to be particularly useful for analysis 45 examp i ej we m i g ht wish to find those Bobs over the age of 

purposes, the user can make the collection specification 511 20: 
for the collection into a permanent part of concepts 311. To 

do this, the user provides a concept definition 321 at 

graphical interface 203. The conccpt definition includes a * IN collection- 1 

name for the concept and a specifier for the collection. If the so where x.age >- 20 

coUection has already been specified by a query and has a 

collection object 509 in saved collections 303, the concept becomes the unclassified concept 
definition need only specify the collection object; otherwise, 
it must specify a conceptual query 319. In the former case, 



collection processor 507 simply retrieves collection speci- 55 (and person 
ficalion 511 from the specified collection object 509, asso- 

ciates specification 511 with the name, and provides the f 

name and the specification as new concept 317 to classifier 

315, which classifies it and adds it permanently to concepts One can take the naming of a collection a step further by 

311. In the case where conccpt definition 321 specifics the 60 explicitly placing it in the concept hierarchy as a classified 

concept by means of a conceptual query 319, collection concept. The following query language statement creates a 

processor 507 provides the query to query interpreter 501, concept described by collection-1: 

which produces empty collection object 515 containing DEFINE_CONCEPT persons-named-bob WITH 

collection specification 511 corresponding (o the query. The collcction-1 

name for the concept is then associated with collection 65 This creates the classified concept persons-named-bob, 

specification 511 and the collection specification added to to which is stored in the Classic concept hierarchy like any 

concepts 311 as just described. other named concept. 
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Since the query language is more expressive than the 
CLASSIC description language, complete translation of a 
query language expression into CLASSIC expression is 
impossible. In this case, the portions of the query inexpress- 
ible in the CLASSIC description language are translated into 
executable Common Usp code, which is embodied in a 
Classic test function. Even in the cases where the translation 
must fall back on the use of test-functions, the collection can 
still be restricted to the most specific parent in the concept 
hierarchy, restricting the number of knowledge base indi- 
viduals upon which the test function must be run. For 
example, suppose we had asked for all persons who have 
working for more than half their lives: 



x IN person 

WHERE x.years-on-job/x.age > 0.5 



(and person 

(test-c '(lambda (x) 

(> (/ (filler x 'years-on-job) 
(filler x 'age)) 
OS)))) 



10 



15 



In this case, the concept representing this collection is 
defined with the help of a Classic test function: 



20 



25 



The foregoing translations are implemented using tech- 
niques well-known in the compiler and interpreter arts. The 
tokens of the conceptual query are lexed, the meanings of 
concepts, roles, and attributes are obtained from concepts 
311 and relationships 314, and then the description language 
statements and test functions which will generate the col- 
lection specified by the conceptual query are generated. 
Details of Monitors 305: FIG. 9 

As previously mentioned, monitors 305 monitor changes 
which occur in knowledge base 320 and performs actions 
based on those changes. For example, suppose that the 
concept Customer is divided into the sub-concepts High- 
Spenders, Medium-Spenders, and Low-Spenders. And sup- 
pose that the definitions of High -Spenders, Medium- 
Spenders, and Low-Spenders arc as follows (these are 
informal definitions): 

High-Spenders: Customers who average more than $100 
in monthly spending 

Medium-Spenders: Customers who average more than 
$20 but less than $100 in monthly spending 

Low-Spenders: Customers who average less than $20 in 
monthly spending 
Suppose that for the first six months of the year, the 
customer Joe Smith spent a total of $300. Consequently, 
after six months, he would be classified as a Medium - 
Spender. If, however, he were to make a $470 purchase in 
the seventh month, his monthly average would go up to 
$110, and he would be automatically reclassified as a 
High -Spender. 

In a data analysis application, it is particularly useful not 
just for individuals to be reclassified, but for an analyst to be 
able to keep track of changes in the classification of indi- 
viduals over time. That is, the analyst might want to know 
which customers have just become High-Spenders, perhaps 
in order to add them to a certain mailing list. In the current 
preferred embodiment, updates arc applied to the knowledge 
base 320 once a month. Information management system 
201 permits analysts to specify which changes the system 
should monitor for during the monthly update. If these 
changes occur, the analyst is notified. Examples of changes 
the analyst could request to be monitored include: 



30 



35 



d0 



45 



50 



55 



60 



65 



Whenever a customer becomes a High-Spender, I want to 
be notified. 

Whenever the number of Low-Spenders increases by 
10%, I want to be notified. 

Monitor all migrations of customers among the concepts 
High-Spenders, Medium-Spenders, and Low- 
Spenders. 

Then, when the knowledge base 320 is updated and indi- 
viduals 313 are reclassified, IMACS checks to see whether 
any of these monitoring conditions are satisfied. If so, the 
analyst is notified. The graphical user interface for defining 
monitors 901 and receiving notifications is described in the 
discussion of the user interface below. 

FIG. 9 presents the detailed structure of a monitor 901 in 
monitors 305. Monitor 901 consists of three major parts: 

code for a condition to monitor for (Triggering Condition 
903), 

a collection of the individuals (1ND) 909 (0 . . . n) that 
satisfy the monitored condition (Collected Individuals 
905), and 

code for conditions under which to notify the analyst 
(Notification Conditions 907). 
The triggering condition 903 for a monitor could be an 
arbitrary function. However, we have found a restricted set 
of conditions to be particularly useful, and we list these for 
the sake of illustration: 
a change FROM one concept TO another 
(TRANSITION) example: FROM High-Spenders TO 
Low-Spenders 
a change FROM one concept (OUT MIGRATION) 

example: FROM High-Spenders 
a change TO a concept (IN MIGRATION) example: TO 
Low-Spenders 

The collected individuals 905 simply is a collection of 
individuals 909 that (during a particular monthly update to 
the knowledge base 320) satisfy the triggering condition 
903. Like other collections in information retrieval system 
201, collected individuals 905 is a first-class object. 

After an update to the knowledge base is completed, all 
the monitors 901 are examined to determine whether the 
notification conditions 907 are satisfied. If so, the analyst is 
notified, as indicated by arrow 915. Two types of criteria that 
we have found useful are: 
specified NUMBER of individuals changed examples: 10 
individuals changed FROM High-Spenders to Low- 
Spenders 5 individuals changed TO Low-Spenders 
a specified percentage of individuals changed examples: 
10% of all High-Spenders became Low-Spenders The 
number of Low-Spenders increased by 20% 
We now can state the monitoring algorithm very simply. 

1. While applying updates to the knowledge base 320 do 
(a) For each individual I that is updated do 

i. Record the concepts) OLDP to which I currently 
belongs (arrow before reclassification (BR) 911); 

ii. Reclassify Uie individual I 

iii. Record the concepts) NEWP to which I now 
belongs (arrow after reclassification (AR) 913) 

iv. Monitor-Change (1, OLDP, NEWP) (see details 
below). 

2. After applying all updates to the knowledge base 320 
do 

(a) For each monitor 901 M do 

i. If the notification conditions 907 are satisfied, then 
notify the analyst that the collected individuals 
905 changed as specified by the triggering condi- 
tion 903. 
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The algorithm for Monitor-Change (I, OLDP, NEWP) is as A necessary part of analyzing data is selecting character- 
follows: istics of the data to view. For example, an analyst might want 

1. For every monitor 901 M do lo sec a table of customers which showed the total amount 

2. If the transition from OLDP to NEWP satisfies the si* 01 , the number of purchases made, and the percentage of 
triggering condition 903, then add the individual I to s purchases that were made during sales. Such a tabic is 
the collected individuals 905 of that monitor. Ienned a view of the data. In order for a dala base manage- 

Interaclions of Users with Information Retrieval System 201 ment system lo be useful, the system must be able to provide 

As previously indicated, a user of information retrieval vicws wmch combine dala from many of the underlying 

system 201 interacts with system 201 by means of graphical tables. The views may be tables, or they may employ other 

user interface 203. The following discussion will explain 10 display techniques. For example, to determine the percent- 

that interaction in some detail. The discussion will use an a 8 e of purchases a customer made during sales would 

example in which information retrieval system 201 is used involve accessing the value of the purchases role for the 

to perform research on the behavior of a department store's customer, determining which purchases were SALE- 

customers. Virtual schema 219 in the example is made up of PURCHASES, then dividing the number of sale purchases 

the concepts, roles, and attributes of a department store ,, b Y & c total number of purchases. 

domain model. FIG. 6 shows this domain model 601. At the 15 considerations led to a decision that all views 

top of the hierarchy formed by the domain model is should bc d» VCD from templates, declarative specifications 

DEPARTMENT-STORE-THING 603, a concept that func- of the data to be displayed, and that all such templates should 

lions simply as the root of the hierarchy. The concepts 403 bc user-edilable. FIG. 7 shows a template 703 and a table 

PURCHASE, ITEM, DEPARTMENT, and SALE are all , n view 701 corresponding to the template 703. (WhUe the use 

subsumed dircdly under DEPARTMENT-STORE-THING 20 of templates * for table may be used for 

and SALE-PURCHASE 607 is subsumed under other lands of views as weU). Each template 703 for a table 

PURCHASE, as shown by the broad arrows. Some of the view consists of a x{ of column hcadm 8* 707 wh,ch define 

concepts have roles which relate them to other concepts. A the columns to ■» displayed in table view 701 and a 

role is indicated by a narrow arrow which relates the role to „ conceptual query language expression 713 which defines 

the other concept. For example, consider CUSTOMER. what is to bc displayed m the column specified at 710. Field 

CLASSIC specifies that role 606 purchases must be filled by 7U > finaUv > defines the variable 10 be used m conceptual 

individuals belonging to the PURCHASE concept. The 9 ucrv language expression 713. Control of template 703 is 

remainder of the list associated with CUSTOMER specifies b y means of buttons 705 

attributes. Attributes 605 indicate information about indi- Usc of templates 703 is as follows: when a domain model 

vidualsbelongingtothe concept which isnotrelated toother 601 is crcated > a of templates is made which prov,des 

concepts. As previously mentioned, test functions can bc basic vicws of data in the domain for domain model 601. 

associated with a concept lo define properties of the concept Analysts then use diese templates to construct other tem- 

that cannot bc expressed in the CLASSIC description lan- P ates a * rec l mred for work - Particularly useful tern- 

guage 223. In this domain model, the definition of SALE- „ P lates 703 ma y be «f vcd t™** h ? others For example, 

PURCHASE uses a test function 609 that examines the date tem P late 703 originally specified a view which indicated for 

of a purchase to see if the purchase occurred during a sale. each customer the amount spent and the number of pur- 

Intcrnally, a concept is defined by means of a data chascs - Wben the analyst selected the original template, a 

structure like that shown at 611 for ITEM . The concept's tab,c corresponding to the original template was displayed, 

name is defined by a string in the name field, the department M *"* ™S the original template decided, however, 

role is defined in the department field, and the remainder of * at hc want c d to sec what percentage of those purchases for 

the fields define attributes and specify limits on the values each customer where made at sales. To see that, the analyst 

which the attributes may have. cdltcd lhc on 8 Qal template. Hc began the editing operation 

When a user of information retrieval system analyzes the b V P^B the " edil le ^ l f but , ton ia „ tibl f 70L ,n the 

data available to the system, the analysis involves four tasks: Siting operaUon, he added the column % sales purchases 

viewing data in different ways, including concept and s P ecified conceptual query expression 713 for that 

definitions, aggregate properties of concepts, tables of uran " 

individuals, and graphs; 

segmenting data into subsets of analytic interest; count (z in <x>.purchascs 

defining new CLASSIC concepts from a segmentation; 50 whcre 2 "> sale-purchase)/ 

monitoring changes in the size and makeup of concepts count (<x>.pmchases) 100 

that result from incremental updates from the data- 
bases. Query expression 713 finds the number of purchases for 
The remainder of this section will illustrate how the inter- each customer and the number of sales purchases, divides 
face supports each of the tasks with usage scenarios from the 55 the number of sales purchases by the number of purchases, 
department store domain and will show how the interface and then multiplies by 100 to achieve the desired percentage, 
combines power and ease of use, supports the practical When the analyst was done editing template 703, he 
interaction of the users' tasks, and supports the users in selected "Done" 705 button to indicate that fact and selected 
managing their work over time. the "use template for this window" button to generate view 
Viewing Data 60 701 corresponding to the edited template 703. If the analyst 
An analyst views data first, to "get a feel for the data", e.g, finds the edited template useful, the analyst can select the 
to determine the attributes that characterize a customer, the "Save changes to template" button to save the changes and 
average amount customers spend, or the amount spent by thereby to produce a new template 703 which is available to 
particular customers, and second, lo formulate questions to others for use and further editing. If the edited template is 
be investigated, e.g. "Is there any correlation between lhe 65 not useful, the "Reset Example" button permits the analyst 
percentage of purchases customers make during sales and to get back to the original template. In the above example, 
the total amount they spend?" the template only involves a single level of the concept 
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hierarchy. Where more than one level is involved, templates 
are inherited down the concept hierarchy and are composed 
to determine the complete view for a particular table: if the 
analyst asks to see a table of the instances of CUSTOMER, 
and CUSTOMER is a specialization of PERSON, the tem- 
plates for both PERSON and CUSTOMER would be used to 
construct the table. 

Note that the template-based scheme does not require 
extra work of an analyst: for all but the simplest views, the 
analyst must select certain characteristics of the data to view. 
And the work of creating a template beaefils both its creator 
and other analysts in the future. As mentioned, one of the 
shortcomings of current tools for data analysis is that they do 
not support management of work over time. In other words, 
the work of viewing and segmenting data that is done as part 
of one analysis is not available for use in another analysis. 
The template-based view scheme also affords important 
opportunities for division of labor and cooperation with 
other analysts. First, while at least one analyst working in a 
particular domain must be familiar with the template editing 
tool and the conceptual query language to create appropriate 
templates, other analysts can use these templates once they 
are constructed. Second, when another analysts need to view 
data somewhat different than existing templates provide, 
their task is to edit an existing template, rather than create 
one from scratch. Since only a small part of the complete 
conceptual query language expression is required for the 
edit, a far lower skill level at composing conceptual query 
language expressions is required. The templates thus serve 
as a point of cognitive contact among users that encourages 
natural division of labor and task-centered, as-needed learn- 
ing. 

In addition to seeing a view as a tabic, an analyst can see 
the view as various types of graphs and plots, for example, 
a plot of the individuals in a table based on the values in a 
particular column of the table. FIG. 8 shows a plot 801 of 
customers based on percent of sale purchases. All of the 
customers are listed on the x axis in order of decreasing 
percent of sale purchases and the y axis shows the percent 
of sale purchases for each customer. 
Segmentation of Data 

The purpose of segmenting data is to create subsets of 
analytic interest, e.g., customers who buy mostly during 
sales, or high spending customers, or customers with high 
credit limits. The presumption is that useful generalizations 
can be made about such subsets, e.g., that they may respond 
well to certain sales or are more likely to get behind in their 
payments. Viewing and segmenting are interwoven tasks: 
viewing data initially suggests hypotheses and questions, 
segmenting the data puts these hypotheses into a testable 
form (by forming categories over which the hypotheses may 
or may not hold), then further viewing of the segments tests 
the hypotheses. It is fundamental to the flexibility of infor- 
mation retrieval system 201 that all collections are first-class 
objects. That is, the same operations can be performed on a 
collection produced by a further segmentation of a given 
collection that could be performed on the given collection. 
For example, if a first segmentation reveals further interest- 
ing properties, a second segmentation may be made of the 
first segmentation. 

Information retrieval system 201 provides 3 ways to 
segment data: with conceptual queries, with forms 
(abstracted from queries), and from graphs. Each method 
has its advantages. The power of a general-purpose query 
language is necessary since it is impossible to anticipate 
every way that analysts will want to segment data. On the 
other hand, it is possible to recognize routine segmentation 
methods in a domain, and this is where forms come in. 
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Segmentation using Graphs: FIGS. 8 and 13 

Graphs afford natural opportunities for segmenting data as 
breaks in a graph suggest segment boundaries. Two such 
breaks appear in graph 801 The analyst can indicate seg- 

s mentation points in a graph with a mouse click; vertical lines 
811 and 813 show the segmentation points, and the hori- 
zontal dotted lines show the boundary elements from the 
data vector. Thus, graph 801 indicates a segmentation of 
CUSTOMERS into those with percent of sale purchases 
greater than 40, between 15 and 40, and less than 15. 
Selecting the "Segment Based on Intervals" button 815 
causes information retrieval system 201 to generate queries 
which will result in the desired segmentation and brings up 
a menu 805 that presents English paraphrases 807 of the 
queries that will be generated to segment the data and has 

15 fields 809 which the analyst can use to name the segments. 
To actually perform the segmentation, the analyst selects 
segment button 817. 

(t is possible to segment from a graph of a column from 
a table of individuals because the column was defined by a 

20 conceptual query language expression. In the example wc 
have been considering, the column "% sale purchases" was 
defined by the expression: 



25 COUNT (z in <x>.purchascs 

where z in SALE-PURCHASE)/ 
COUNT (<x>.puichases)*100 



From this conceptual query language expression and from 
30 the segmentation points indicated by the analyst, queries to 
segment CUSTOMERS into those with percent of sale 
purchases greater than 40, between 15 and 40, and less than 
15 are generated automatically. For example, the query that 
defines the second segment is: 

35 



x in CUSTOMER where 
(COUNT (z in <x> .purchases 

where z b SALE-PURCHASE)/ 
COUNT (<x>.purcnases)*100) 3 IS AND 
(COUNT (z in <x>. purchases 

where z in SALE- PURCHASE)/ 
COUNT (<x>.purchase»)'100) < 40 



When the segmentation is done, tabic 803 appears, which 
45 lists the segments in the order in which they appear in the 
graph and the number of customers in each segment. 

The above technique depends on a feature of the user 
interface: for each graph, table, or the like which graphical 
user interface manager 229 displays in display 205, manager 
50 2 29 maintains an associated data structure. Thus, as shown 
in FIG. 13, manager 229 maintains table record 1303 
corresponding to table 701 in display 205 and graph record 
1325 corresponding to graph 801. The associations are 
indicated in FIG. 13 by dashed lines. 
55 One of the primary purposes of this record is to enable the 
graphical displays to be "live", i.e., for a user to be able to 
get more information about the numbers, graphics, etc. For 
that reason, each associated record contains a collection 
object 1301 specifying the collection from which the table or 
60 graph is generated and the conceptual query expressions 
1311 used to generate the graph or table. Thus, table record 
1303 records (among other things): 
The collection object 1301 which defines the collection 
from which information about individuals is being 
65 displayed; and 

for each column in the table, the query language expres- 
sion 1311 that defined the data in this column. 
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So, table record 1303 for table 701 described above would 
include the following information: 



C in Qixtomer 

where ( COUNT (z in Cpurchaee* where 
z in Sale-Purchase)/ 



Table-Record 1303 5 COUNT (G purchases)* 100 ) 

Collection Object 1301-Customer > _ 40 

OLE 1311(0)- 

xainount- spent 

OLE I3ii(i)- where C is a system-generated variable name and Customer 

OLE niiC^"** 8 " ^ una, crstood to be the collection specified in collection 

COUNT (z in xpurchases where 10 oD j ect 1301 • 0f ™ pointed out earlier, in a preferred 

z in Sale- Purchase)/ embodiment, the collection is specified using description 

COUNT (x.purchasea)*ioo language 223. Note that all occurrences of the free variable 

— — — — — — ^ — - tt x" in the query language expression 1311 (which ranged 

over individuals in the table) were replaced by the new 

Users can perform many operations on the data displayed in 15 variable name "C\ 

a table, including examining all the data for a particular i n general, suppose the system needs to construct a query 

individual and sorting the table based on a particular col- for the set S, the query language expression QLE, and 

umn. What is relevant here is that users also may request a user-specified lower bound LB, and upper-bound UB. The 

graph of the data in a particular column, like "% sale query will be of the form: 

purchases". (Note: in order to graph the data in a column, the 20 

data must be numeric, and must be sorted). FIG. 8 shows ; 

graph 801 for the u % sale purchases 1 ' column of Customers. ^ ere QLE { V AR) >- LB AND 

To make the graph, graph manager 1321 proceeds as fol- qle(var) < UB 
lows: the user selects a column from table 701, as indicated 



by the graph column request (GCR) arrow 1313. Graph 25 where VAR is a system generated variable name, S specifics 

manager 1321 responds to the selection by reading the a collection, and the notation QLE( VAR) means that the free 

conceptual query expression l3U(i) for the relevant column variable in QLE has been replaced by VAR. 

from table record 1303 for table 701 (arrow 1304), using the W hiie we nave shown only how to construct queries from 

conceptual query expression 1311(i) to obtain the relevant 3Q graphs of a single column from a table (i.e., defined by a 

information from the individuals in the collection specified single query language expression), this scheme generalizes 

by collection object 1301 in table record 1303, and then to graphs that show multiple columns. For example, suppose 

making graph 801 and graph record 1325. Graph record we have a two dimensional graph where the x coordinate 

1325 contains (among other things) conceptual query plots data from column C-l of a table (defined by QLE-1), 

expression 1311(i) and collection object 1301 from table 35 and the y coordinate plots data from column C-2 of a table 

record 1303. (defined by QLE-2). Then the user could indicate a segment 

For example, the graph record for the graph in FIG. 8 by specifying a rectangle on the graph. If the rectangle was 

would include the foUowing information: **** b ? and ™ ™J* y 

coordinates Y-MAX and Y-MIN, the query that the system 

4 0 would generate would be 



-Collection object 1301-Cusiomer 

-QLE 1311- 

COUNT (z in x.purchases where VAR in S 

z in Sale-Purchase.)/ where QLE-1(VAR) >- X-MIN AND 

COUNT (xpurchases)' 100 QLE-1(VAR) < X-MAX and 

45 QLE-2(VAR) >■ Y-MtN and 

QLE-2(VAR) < Y-MAX and 



How does the system generate the queries from the graph? ~ 
In response to a segmentation request 1315 from the user, It should be po j nle£ j om that the foregoing tech- 
graph manager 1321 reads graph record 1325, which shows niquc ^ by no racans n m itcd in its application to virtual data 
that 50 base management systems, but can be applied in standard 

1. the collection to be segmented was Customer data base management systems as well. 

2. and the query language expression 1311 that generated queries employed in a 
the data values was domain, e.g., segmenting the instances of a concept by the 

ss amount of change in a vector attribute (like purchase 

count (z in xpurchases where history) of each instance. The most important aspect of these 

2 in Sale-Purchase)/ forms is that they are all derived from queries in the query 

count (^purchases)* 100 language by replacing parts of the queries by variables. 

^ — ^ ^ — ^— ^ Forms may be defined in two ways: when a particular data 

m. , 1 ji . . . 1 60 retrieval application is designed, the most common queries 

The segment request 1315 further mdicated that the lower afe made ^ forms aDd ^ fa g u ^ . & ^ ^ 

bound for the segment "sale customers" was 40. sys|cm ^ bowcycr> tf {Q 

Using the specification of the collection in collection an ad-hoc query in the query language that they then realize 

object 1301, the query language expression I311(i) that is generally useful, a simple "abstraction" window guides 

generated the data values, and segmentation request 1315, as 65 them through the process of creating a form from the query, 

indicated by arrows 1315 and 1317, the system generates the The observations made about view templates as reusable 

following conceptual query 319: resources and media for cooperation apply to forms as well. 
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FIG. 10 shows a form 1001 being filled out that will Thereupon, window 1519 for the selected monitor appears, 
segment customers' purchases by the department of the item The window describes the monitor 901 to which it corre- 
purchased; the resulting table 1011 might lead the analyst to spends and includes a comment 1521 indicating why the 
look for correlations among departments in which customers change is interesting. If the analyst wishes to investigate 

make their purchases. In form 1001, the analyst specifies 5 further, she can select button 1523 to see the individuals in 
iteration over all DEPARTMENTS and CUSTOMERS in CIS 905. The analyst can further convert the collection to a 
field 1004; in field 1003, the analyst specifies the variables concept by selecting make concept button 1525. 
which wiU represent the DEPARTMENTS and CUSTOM- Operation of the User Interface: FIGS. 11 and 12 
ERS in the queries generated from the form; as set out at Consider a data analyst who is interested in exploring the 

fields 1005 and 1007, the independent variable is C, standing 10 S cncral patterns of customers. The analyst wants to 

for CUSTOMERS and the dependent variable is D. The determine whether customers can be grouped into categories 
connection between CUSTOMER and DEPARTMENT is such « "regular' , "semi-regular' , and "infrequent , which 
specified by fields 1013 and 1015; field 1013 specifies the ^ useful for predicting ; customer activity and targeting 

chain of roles that relates the two: the role purchases in ¥ G ' j 1 show f . some of th f e J™ 1 "" 

„ ncmwrn , , . . n,m/nj*cc u u- 1101 which will be displayed in graphical user interface 203 

CUSTOMER refers to the concept PURCHASE, which in 15 m suc j J ^ exp i oral j on 

turn has the role item which refers to the concept ITEM. ^ aoaJ by browsing ^ domain model (shown 

Within the concept ITEM, the concept DEPARTMENT is m ^ndo* m7 ) > locating the CUSTOMER concept, and 
referred to by the role department, as set forth in field 1015. displaying it in a concept-at-a-glance window 1119. This 
When "apply" button 1009 is specified, query processor 301 window displays aggregate information about the set of all 

generates one query for each possible pairing of DEPART- 2 o customers, in this case the minimum, maximum, and aver- 
MENT and CUSTOMER individuals. A typical query would age 0 f Uj e numeric role total-spent- 1991. She then goes to 
be: work on Customers in analysis work area 1103. Instead of 

x in Joe-Smith .purchases.itera where typing a query in 1105, she begins to segment the set of 

x.department-Appliances customers by using the form Segment by Numeric Attribute 

Defining Concepts: FIG. 14 25 (screen 1109), which has been selected from the "Library of 

FIG. 14 shows the windows used in a preferred embodi- Abstract Queries shown in window 1113. To fill out the 
ment of information retrieval system 201 to define a concept form, the analyst specifies the concept to be segmented 
from a collection. There are two techniques: defining seg- (CUSTOMER), the role on which to key the segmentation 
mentations as concepts, and defining collections as concepts. (total-spent-1991), and the attribute values that determine 

Window 1401 shows how segmentations are defined as 30 the segments. We assume that the analyst wants to divide 
concepts. As shown in FIG. 8, screen 805 permits an analyst Customers into three approximately equal groups, corre- 
to give the segments of a collection names 809. When the sponding roughly to low, medium, and high spenders, so she 
analyst selects the "Define" button of section 1107 of the must supply two numbers, say, 500 and 1500. This will 
Analysis Work Area of FIG. 11 after having named the result in a segmentation of customers into three classes: 

segments, screen 1401 appears. By entering names in field 35 those who spent less than $500, those that spent between 
1403, the analyst can specify the names for the concepts 311 S500 and $1500, and those that spent more than $1500. Note 
corresponding to the segments. Once the names have been that the numeric bounds selected, here 500 and 1500, are 
entered, the analyst can name the concepts by pushing the only best guesses: it is only through further analysis (and 
"Define" button 1405. perhaps changing the bounds) that the utility of any seg- 

Window 1413 shows how the analyst can define a concept ao mentation can be determined. The results of the segmenta- 
from a collection. As indicated by button 1407, system 201 lion are displayed in an analysis table window 1121. The 
maintains a menu of collections. When the analyst selects query and the view it produces are related by an ID#, in this 
button 1407 and then selects a collection from the menu case, 7228. Table 1121 shows the three segments and the 
displayed in response to button 1407, the name of the number of customers that fell into each segment, 

collection appears in field 1409. The analyst can then name 45 Let us assume that tabic 1121 indicates that there arc 
the concept corresponding to the collection by typing the many customers who spent only a small amount at the store 
name for the concept in field 1411 and selecting "Define" in 1991; this suggests a class of customers who arc not 
button 1405. regular customers. To explore the relationship between 

Defining Monitors: FIG. 15 amount of money spent and regularity of purchasing, the 

FIG. 15 shows the windows used to define monitors 901 50 analyst again segments Customers using the Segment by 
and observe the changes reported by the monitors. Window Numeric Attribute form, this time based on the role number- 
1501 is used to define a monitor. The input to field 1503 of-purchases-in-1991, to create segments for incidental, 
gives the monitor a name; the inputs to fields 1505 and 1507 semi-regular, and regular purchasers. Suppose the analyst 
define the type of the monitor and the concepts to which it next displays a table of the incidental purchasers and dis- 

applies. In this case, the monitor reacts to individuals 55 covers that some spent quite a lot while other spent very 
coming into the concept Sale-Customers. In a preferred little. She now may form the hypothesis that the high 
embodiment, monitors 901 will notify the analyst whenever spenders are more likely to make purchases during sales, 
cither a critical number or critical percentage of changes is To investigate this hypothesis, the analyst edits the table 
reached; which it is to do, and what the the number or of incidental purchasers to show not only the amount they 

percentage is to be is defined in fields 1509 and 1511. 60 spent, but also the percent of purchases they made during 
Selecting button 1513 creates the monitor 901 defined by the sales. She then can specify that she wants to see a scatter plot 
fields and adds it to monitors 305. of the amount spent vs. the percent sale purchases for each 

After the data in individuals 313 has been updated, incidental purchaser. If the scatter plot indicates a positive 
window 1515 displays a list of monitors 901 for which there correlation between the percent sale purchases and the 

have been changes requiring notification. As indicated by 65 amount spent, the analyst may recommend that the store 
1517, the monitors are listed by name. To view the changes, increase the number or length of sales it holds or that it 
the user selects one of the names in window 1515. advertise sales more extensively. 
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Finally, assume thai the analyst decides that it is appro- 
priate to permanently track the size and makeup of some of 
these segments. She can create Classic concepts for the 
regular purchaser and high spender segments. The table 
which shows the high spender segment is shown at 1123. By 
filling out a Monitor Change window 1501, she can specify 
that she wants to be informed whenever 5% of the customers 
in the (newly created) Regular-Purchaser concept migrate 
oul of the concept. When incremental updates to the knowl- 
edge base arc processed, all changes to the classification of 
individuals in the knowledge base are recorded, and if any 
of the conditions specified by the analyst are met, the analyst 
will be notified in window 1115. The store then can take 
proper action. Much of the foregoing is summarized in FIG. 
12, which shows a partial roadmap 1201 of the interaction 
between the analyst and the user interface. 15 

CONCLUSION 

The foregoing Detailed Description has disclosed to those 
of ordinary skill in the arts to which information retrieval 
apparatus 201 pertains how to build and use such an appa- 20 
ratus. In the course of that disclosure, it has been further 
shown to those of ordinary skill in the art how to convert a 
query into a concept, how to create and use a monitor, and 
how to use a graph to define a query. While the techniques 
for building and using the apparatus disclosed herein arc the 
best presently known to the inventors, other implementa- 
tions will be immediately apparent to those of ordinary skill 
in the art. 

For example, the preferred embodiment employs the 
CLASSIC knowledge base management system; however, a 
virtual data base management system can be constructed 
using any kind of knowledge base system or even an 
ordinary data base management system to implement the 
virtual schema and the virtual data base. Similarly, the 
techniques employed to derive a query from a graph can be 
practiced in any kind of data base management system while 
monitors can be used in any knowledge base management 
system which can reclassify its data. The conversion of a 
query to a concept, finally, can be accomplished in any 
knowledge base management system which is able to add a 
new concept. Additionally, other algorithms and data struc- 
tures may be used to attain the same ends as the ones 
disclosed herein and the apparatus may be implemented io 
systems having other kinds of user interfaces than the 
graphical user interface disclosed herein. 

All of the above being the case, the foregoing Detailed 
Description is to be understood as being in every respect 
illustrative and exemplary, but not restrictive, and the scope 
of the invention disclosed herein is not to be determined 
from the Detailed Description, but rather from the claims as 50 
interpreted in light of the Detailed Description and in 
accordance with the Doctrine of Equivalents. 

What is claimed is: 

1. In an information retrieval system, a method for 
organizing a body of information comprising: 

establishing a knowledge base containing said body of 
information, said knowledge base including descrip- 
tions of individuals and of concepts to which the 
individuals belong, the descriptions being classified 
into a generalization ordering; 

responding to a query specifying a collection of the 
individuals by making a collection specification which 
is one of the descriptions and which specifies the 
collection of individuals; and 

receiving the collection specification and classifying the 65 
collection specification into the generalization order- 
ing. 



2. In an information retrieval system including a knowl- 
edge base containing a body of information, the knowledge 
base including descriptions of individuals and of concepts to 
which the individuals belong, the descriptions being classi- 
fied into a generalization ordering, a method for detecting a 
change in the body of information contained in the knowl- 
edge base, the method comprising the steps of: 

making an alteration with regard to one or more of the 
individuals; 

making a change in a relationship between the one or 
more individuals and the concepts as required by the 
alteration and the concepts; and 
indicating an effect of the change with regard to one or 
more of the individuals. 

3. A method for retrieving data from one or more database 
management systems, the method comprising the steps of: 

organizing the data in a knowledge base which includes 
descriptions of individuals and of concepts to which the 
individuals belong, the descriptions being classified 
into a generalization ordering, and operating on the 
data in response to expressions stated in a description 
language which employs the concepts; 
receiving the expressions from the knowledge base, trans- 
lating the expressions into first queries and providing 
the first queries to the database management systems, 
receiving the data from the database management 
systems, and incorporating the received data together 
with the expressions into the knowledge base; and 
receiving second queries specifying certain of the data 
and responding thereto by translating the second que- 
ries into expressions which specify retrieval of the 
certain data from the knowledge base, and providing 
the certain data. 

4. A program storage device, readable by a computer 
having a memory, tangibly embodying one or more pro- 
grams of instructions for organizing a body of information 
within an information retrieval system, said programs of 
instruction being executable by the computer to perform the 
steps of: 

establishing a knowledge base containing said body of 
information, said knowledge base including descrip- 
tions of individuals and of concepts to which the 
individuals belong, the descriptions being classified 
into a generalization ordering; 
responding to a query specifying a collection of the 
individuals by making a collection specification which 
is one of the descriptions and which specifies the 
collection of individuals; and 
receiving the collection specification and classifying the 
collection specification into the generalization order- 
ing. 

5. A program storage device, readable by a computer 
having a memory and an information retrieval system 
iocluding a knowledge base containing a body of 

55 information, the knowledge base including descriptions of 
individuals and of concepts to which the individuals belong, 
the descriptions being classified into a generalization 
ordering, said program storage device tangibly embodying 
one or more programs of instructions for detecting a change 
60 in the body of information contained in the knowledge base, 
said programs of instruction being executable by the com- 
puter to perform the steps of: 
making an alteration with regard to one or more of the 
individuals; 

making a change in a relationship between the one or 
more individuals and the concepts as required by the 
alteration and the concepts; and 



25 



30 



35 



40 



45 



04/28/2004, EAST Version: 1.4.1 



5,806,060 



23 



indicating an effect of the change with regard to one or 

more of the individuals. 
6. A program storage device, readable by a computer 
having a memory and including one or more database 
management systems, said program storage device tangibly 5 
embodying one or more programs of instructions for retriev- 
ing data from said database management systems, said 
programs of instruction being executable by the computer to 
perform the steps of: 

organizing the data in a knowledge base which includes 10 
descriptions of individuals and of concepts to which the 
individuals belong, the descriptions being classified 
into a generalization ordering, and operating on the 
data in response to expressions stated in a description 
language which employs the concepts; 
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receiving the expressions from the knowledge base, trans- 
lating the expressions into first queries and providing 
the first queries to the database management systems, 
receiving the data from the database management 
systems, and incorporating the received data together 
with the expressions into the knowledge base; and 

receiving second queries specifying certain of the data 
and responding thereto by translating the second que- 
ries into expressions which specify retrieval of the 
certain data from the knowledge base, and providing 
the certain data. 
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